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Abstract 

Background: Recent advances in genome technologies and the subsequent collection of genomic information at 
various molecular resolutions hold promise to accelerate the discovery of new therapeutic targets. A critical step in 
achieving these goals is to develop efficient clinical prediction models that integrate these diverse sources of 
high-throughput data. This step is challenging due to the presence of high-dimensionality and complex interactions 
in the data. For predicting relevant clinical outcomes, we propose a flexible statistical machine learning approach that 
acknowledges and models the interaction between platform-specific measurements through nonlinear kernel 
machines and borrows information within and between platforms through a hierarchical Bayesian framework. Our 
model has parameters with direct interpretations in terms of the effects of platforms and data interactions within and 
across platforms. The parameter estimation algorithm in our model uses a computationally efficient variational Bayes 
approach that scales well to large high-throughput datasets. 

Results: We apply our methods of integrating gene/mRNA expression and microRNA profiles for predicting patient 
survival times to The Cancer Genome Atlas (TCGA) based glioblastoma multiforme (GBM) dataset. In terms of 
prediction accuracy, we show that our non-linear and interaction-based integrative methods perform better than 
linear alternatives and non-integrative methods that do not account for interactions between the platforms. We also 
find several prognostic mRNAs and microRNAs that are related to tumor invasion and are known to drive tumor 
metastasis and severe inflammatory response in GBM. In addition, our analysis reveals several interesting mRNA and 
microRNA interactions that have known implications in the etiology of GBM. 

Conclusions: Our approach gains its flexibility and power by modeling the non-linear interaction structures between 
and within the platforms. Our framework is a useful tool for biomedical researchers, since clinical prediction using 
multi-platform genomic information is an important step towards personalized treatment of many cancers. We have a 
freely available software at: http://odin.mdacc.tmc.edu/~vbaladan. 
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1 Introduction 

Recent advances in genome technologies such as microar- 
rays and next-generation sequencing have enabled the 
measurement of genomic activity at a very detailed 
resolution (e.g., base pair, single-nucleotide polymor- 
phisms) as well as across multiple molecular levels: the 
epigenome, transcriptome and proteome. The collec- 
tion of genomic information at various resolutions holds 
promise to accelerate the amalgamation of discovery sci- 
ence and clinical medicine [1]. One of the overarching 
goals of such studies is to relate these genomic data to rel- 
evant (patient-specific) clinical outcomes, not only to find 
significant biomarkers of disease progression/evolution 
but also to use the biomarkers to develop prediction 
models for deployment in future therapeutic studies. Fur- 
thermore, genomic data are now available from multiple 
platforms and resolutions for the same individual, thus 
allowing a researcher to simultaneously query these multi- 
ple sources of data to achieve these goals. Such motivating 
data have been collected under the aegis of The Cancer 
Genome Atlas (TCGA) project, wherein data from mul- 
tiple genomic platforms such as gene/mRNA expression, 
DNA copy number, methylation and microRNA expres- 
sion profiles are available for multiple tumor types (see 
http://cancergenome.nih.gov for more details). In addi- 
tion, the available clinical information, such as stage 
of disease and survival times, motivates the analytic 
frameworks that integrate patient-specific data. 

One of the main challenges in modeling the statisti- 
cal dependence between such high- throughput studies is 
that a large number of measurements (usually in thou- 
sands) is available for a relatively small number (usually 
in tens or hundreds) of patient samples; therefore, clas- 
sical statistical approaches based on linear models and 
hierarchical clustering are prone to over- fitting [2,3]. In 
these situations, [3] recommends accounting for high- 
dimensionality by using approaches that borrow infor- 
mation across covariates to compensate for the limited 
information available across samples, which leads to bet- 
ter and more reliable inference. Several approaches have 
been developed to address these challenges in various 
contexts. Some examples include linear parametric mod- 
els and hierarchical clustering for inferring the relation 
between phenotypes and genomic features [4], hierarchi- 
cal Bayesian modeling approaches based on linear shrink- 
age estimators [5], linear canonical correlation analysis 
[6], intensity-based approaches for merging datasets [7], 
and regularized linear regression approaches [8]. 

Although these approaches are computationally effi- 
cient, interpretable, and simple, they make two unrealistic 
assumptions for practical data analysis. First, due to the 
parametric and linear assumptions, they might miss the 
underlying non-linear patterns in the data. Second, and 
more importantly, these non-linear patterns are further 



amplified in the presence of complex interactions within 
and between the different platforms that must be mod- 
eled while integrating data from these platforms. In this 
paper, we present a statistical machine learning approach 
called the hierarchical relevance vector machines 
(H-RVM) to address these modeling and inferential chal- 
lenges. Briefly, the framework presented here: (a) models 
the relation between a relevant clinical outcome (scalar) 
and high-dimensional covariates/features through a 
data-adaptive and flexible nonparametric approach, (b) 
borrows information within and between platforms 
through a hierarchical Bayesian framework, (c) acknowl- 
edges and models the interaction between platforms 
through nonlinear kernel-based functionals, (d) has 
parameters that have explicit interpretation as the effects 
of the platforms and their interactions on the outcome, 
and (e) uses a computationally efficient variational Bayes 
approach that can be readily scaled to large datasets. 

Our methods are motivated by and applied to a TCGA 
based glioblastoma multiforme (GBM) dataset, for which 
we integrate gene (mRNA) and microRNA (miRNA) 
expression profiles to predict patient survival times a . 
There is an increasing interest in identifying subtypes of 
GBM based on its gene expression data. The ultimate goal 
of subtyping GBM is to identify gene expression profiles 
that are prognostic or predictive of treatment outcomes. 
The known subtypes of GBM samples in TCGA include 
pro-neural, neural, classical, and mesenchymal; with the 
first two classes of which are suspected to differ from the 
other two in the cell of origin, which is a critical deter- 
minant of effective treatment regimens [9]. Differential 
expressions of miRNAs were recently found to be associ- 
ated with many diseases, including cancers [10,11]. Pre- 
vious studies have shown that combining multiple types 
of data, such as mRNA and miRNA expressions, could 
significantly improve the accuracy of detecting GBM sub- 
types, and thereby potentially predict the clinical out- 
comes [12]. However, methods are lacking to accurately 
model the effect of interactions between these data types 
directly on clinical outcomes. Here we show that our non- 
linear and interaction-based integrative methods have 
better prediction accuracy than linear alternatives and 
non-integrative methods that do not account for the inter- 
actions between the platforms. We also find several prog- 
nostic mRNAs and microRNAs that are related to tumor 
invasion and that are known to drive tumor metastasis and 
severe inflammatory response in GBM. In addition, our 
analysis reveals several interesting mRNA-miRNA inter- 
actions that have known implications in the etiology of 
GBM. The paper is structured as follows. The basic con- 
struction of H-RVM is detailed in Section 2. The analysis 
of GBM data is presented in Section 3, and concluding 
remarks about the H-RVM framework are presented in 
Section 4. 
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2 Hierarchical Relevance Vector Machine model 

For ease of exposition, we illustrate the model build- 
ing process of H-RVM using data from two sources: 
gene/mRNA and miRNA expression measurements. The 
framework is easily extended to multiple platforms as 
discussed in Section 4. Suppose, we have data for N 
patients, and X and Y represent mean-centered and - 
standardized gene and miRNA expression matrices, with 
rows corresponding to patients and columns represent- 
ing the G genes and M miRNAs, respectively b . Center- 
ing and standardizing the gene and miRNA expression 
matrices remove any systematic mean or scaling effects 
caused by the use of different data sources, and make 
them compatible for model fitting. We denote the gene 
and miRNA expression for the i-th patient as row vec- 
tors xf = (xn, . . .,x iG ) and yf = (yn, . . . ,y iM )- These 
covariates are high-dimensional, that is, both G and M 
are much larger than N; for example, in the GBM data 
G « 12000, M « 540,7V « 250. Based on these measure- 
ments, our aim is to predict a relevant clinical outcome, 
which in our case is the (log-transformed) survival time 
measured from time of diagnosis to death, denoted by the 
column vector t = (ti, . . . , £#) for the N patients. 

2.1 Basic construction 

A basic (conceptual) model can be written in a high- 
dimensional regression setting as, 

ti = o(o+f x (x h a 1 )+f Y (Y h a2)+f( X 0 Y )(x h Y h a 3 )+e h (1) 

where oto is the overall mean effect and e; is the random 
error; /(•)% generally referred to as basis functions, are 
chosen to achieve a desired level of flexibility for model- 
ing the effects of X, Y, and their functions on t. Of these 
functions,/( x( g)y) models the interactions between X and Y, 
and the remaining basis functions,^ and f y , respectively, 
model the main effects of X and Y for predicting t. In most 
situations the regression coefficients, a = (a?o,ai,a2>a3)> 
linearly relate the covariate effects (i.e., values of the basis 
functions evaluated at the covariates) to the response. Lin- 
ear regression is a special case of (1) when all the basis 
functions are linear, and the response for the /-th patient, 

U = xf ai + yja 2 + (Xj <8> Yi) T ci3 + e h (2) 

where (x/®y/) = (xnyn, . . .,Xiiy iM > • • • ,XiGyn, • • -,XiGyiM) 
models the first order interactions between genes and 
miRNAs and ao = 0 because of the centered covariates. 
Further, due to the high-dimensional covariates x/s and 
y/s, a penalty is imposed on the regression coefficients 
a = (ai,a2,«3) to avoid overfitting. The most popular 
of such penalties is the Lasso because it has many desir- 
able properties for high-dimensional linear regression 
and variable selection [13,14]. Although (2) with a Lasso 
penalty is a popular choice for high-dimensional regres- 
sion, the linearity of the basis functions imposes serious 



restrictions on the flexibility of the model. For example, 
(2) does not model nonlinear covariate effects as well as 
second or higher order interactions between genes and 
miRNAs. 

Through H-RVM, we propose a regression model as 
a special case of (1), using kernel-based functions to 
respectively model f x >fy> and /( x <g>y). The kernel func- 
tions incorporate nonlinear effects of possible interactions 
within and between high-dimensional gene and miRNA 
expression measurements. Further, H-RVM estimates the 
respective contributions of genes, miRNAs, and their 
interactions in predicting survival times, which is of pri- 
mary importance in developing novel drug targets. H- 
RVM posits the following regression of t on X and Y for 
the i-th patient: 

U = ^]/x{x/,ai}+^ y {y/,a 2 }+ft/(x0y){( x ^ ) y)/> a 3}+Q ) 

(3) 

where (x <g> y); = (xn, . . .,XiG,yn, • • • 9 ym) is a vector 
of length G + M and 0 = (ft, #2,^3) is such that its 
components lie on a probability simplex i.e. /3 m > 0 for 
m = 1, 2, 3 and J2m=i Pm = 1. H-RVM posits different 
kernels for the data sources and combines them through 
weights fi. The model parameters have the following inter- 
pretation: 

• The kernel functions^ (•) and fy^) model all possible 
interactions among genes and among miRNAs, 
respectively, and/( x <g>y) (•) models all possible 
interactions between genes and miRNAs. The three 
kernels together account for the high-dimensionality 
and non-linearity of the covariate effects of X and Y 
by embedding them in the space of kernels. 

• The m-th component of f} m , denotes the influence 
of the m-th source on predicting the log survival time 
and has the following interpretation: if = {1, 0, 0}, 
then (3) corresponds to a functional regression model 
that predicts t (log survival time) with only x (gene 
expressions) as covariates. Conversely, if 

p = {1/3, 1/3, 1/3}, then (3) corresponds to a 
regression model, with the platforms and their 
interactions contributing equally to the prediction of 
the survival time. In reality, we expect (and show) 
different contributions from each platform and 
estimate these weights from the data. 

The task now remains to explicitly characterize the 
functions ^x(«),^y(») and /( x <g>y)( # ) using multiple kernels, 
as detailed below. 

2.2 Multiple kernel learning 

Kernel learning (KL) is an approach for nonparametric 
classification and regression that can be used for predict- 
ing t based on X and Y [14]. First, for simplicity, assume 
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that we want to predict t based on X. KL posits the 
following relation between t and X 

N 

U = a 0 + a j K ( x j> x^ 2 ) +€i^>t = K T a + € , (4) 

where a 2 is a kernel-specific "bandwidth" parameter and 
depends on the choice of kernel, 7<T(«) (detailed later in the 
section) and € = (ei, . . . is the (white-noise) error. 
The primary parameter of interest is a = (oto, . . . , a?jv) r , 
and ai, . . . ,otN correspond to weights assigned to the 
features for N x/s. Support Vector Machine (SVM) and 
Relevance Vector Machine (RVM) are canonical examples 
of KL [14]. We prefer RVM because of its probabilistic 
interpretation and other optimal properties compared to 
those of SVM [15]. There are cases, however, where one 
feature matrix may not fit the data well. Based on this 
observation, Multiple Kernel Learning (MKL) extends (4) 
and replaces K by a weighted average of L feature matrices 

N L 

7=1 1=1 & 

t = ftKf ot + . . . + PiKfa + . . . + p L K[a + e . 

MKL improves the flexibility of KL by introducing L 
bandwidth parameters {crf}^ and L weights for feature 
matrices ft = . . . , Pl) t . A variety of approaches 
exist to learn {a 2 }^ =1 , and a for MKL (for details see 
[14,16,17]). Note that in all these works the data source 
(i.e., X) remains the same for both KL and MKL. The H- 
RVM framework developed in this article extends KL to 
include multiple data sources and their interactions, and 
uses a learning algorithm similar to the MKL framework. 

Because the three data sources (gene expressions, 
miRNA expressions, and their interactions) can be used 
separately for predicting the log survival time, it is reason- 
able to combine their predictions to obtain more reliable 
estimates. To this end, H-RVM combines respective pre- 
dictions obtained from different sources obtained using 
KL (4) through a weighted average, and chooses appro- 
priate weights using MKL (5). Similar to (4), Kf a, K|a, 
and K^ot are the predicted values of t that correspond to 
genes, miRNAs, and their interactions, respectively. Using 
(5), we combine the predictions [Kfct}? =1 through the 
weight vector ft = fc, fa) to model t as 

t = (ftKf + £ 2 I<2 + &K 3 r )<* + * = Kja + e. (6) 

We further constrain ft such that its components lie 
on a probability simplex, i.e., Y^n=l = 1. This con- 
straint ensures that the joint (convolved) kernel, K^, is 
positive definite and that fi m denotes the influence of the 
m-th source in predicting the log survival time. Note that 
H-RVM is a special case of (3) with f x (*i, a) = k^.a, 



jy(y,-,a) = k£.a, and/ (x0y) ((x®y);,a) = k^-a, where k m)i 
is the i-th column of K m . Given {K/}? =1 , MKL can be used 
to learn a and 

Although similar to (5), (6) differs in two important 
ways. First, (6) obtains kernels using (4) for different 
data sources, namely gene expression, miRNA expression, 
and their interaction. Second, we allow for dependence 
between data sources via the interaction kernel (K3), but 
MKL does not; instead MKL uses a convex combination 
of the different kernels from the same data source to aid 
prediction. 

The learning algorithm of H-RVM is independent of 
the choice of kernels, but in this work we use a Gaussian 
radial basis function (RBF) kernel (denoted by K) [14]. The 
RBF kernel maps the m-th high-dimensional covariate to 
its feature space that is represented as feature matrix K m . 
The feature matrices Ki, K2, and K3 correspond to genes, 
miRNAs, and interactions, and their (/,/)-th entries are as 
follows: 

11 n2 
H x /- X ;ll 

(Ki)*/ = e 2 °i =K(x h Xj\a?), 

WYi-Yjf 

(K 2 )ij = e 2 *2 =K( yi -,y ; -|or 2 2 ), 

||(x(g)y)/-(x®y) 7 -|| 2 

(K 3 )ij = e 2ff 3 = K((x 0 y),., (x 0 y) ; |a 2 ), 

(7) 

where cr^ is the "bandwidth" parameter of the m-th ker- 
nel matrix and is chosen a priori through cross-validation 
(see [14] for details). The other choices of kernels include 
polynomial kernels and matern kernels [18]. To account 
for the overall mean (or intercept) in (1), an extra row of 
Is is appended to the feature matrices in (7); therefore, 
{K/}? =1 hereafter have dimensions (N + 1) x N. 

2.3 Generative Bayesian model for H-RVM 

H-RVM reformulates (6) as a hierarchical Bayesian model 
for greater flexibility and better interpretation of its 
parameters. This reformulation serves two important 
purposes. First, H-RVM is interpreted as a hierarchical 
Bayesian extension of RVM [15], which is a special case 
of Bayesian KL. Second, instead of using MKL methods, 
H-RVM learns parameters a and P from t, X, and Y using 
the variational learning algorithm of hierarchical kernel 
learning (HKL) [14,16]. 

H-RVM posits the following generative model for the 
(noisy) log survival time measurements t. Similar to MKL, 
K^a represents the mean of t. The error distribution is 
Gaussian with mean 0 and precision parameter y (8). 
Further, we impose a Gamma prior on y such that 

t|a, fi, 7, X, Y ~ Af(t\Kja, y' 1 !), (8) 
y ~ Gamma(y |c K ,d K ), (9) 
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where Af(.\/L, X) represents a multivariate Gaussian dis- 
tribution with mean fi and covariance matrix X and 
Gamma(.|c # , d m ) represents a Gamma distribution with 
respective shape and rate parameters c. and d.. 

Motivated by the "automatic relevance determination" 
idea of RVM, we impose independent Gaussian priors 
on the otj's with the same mean 0 and different precision 
parameters <p/s (10), where 0y controls (a priori) predic- 
tive power of the ;-th feature vector from the three data 
sources for the log survival time. A large 0y indicates low 
predictive power. We also impose independent Gamma 
priors on the 0y's, 

a |0 ~ N(pL\0,diag(4>- 1 )), (10) 

N 

0 ~ ]~[Gamma(0 7 -|c0,rf0), (11) 

where 0 = (0o, 0 X , . . . , 0«). This setting forces many ay's 
a posteriori to be near 0 with high precision. Most of the 
variance in t is explained by a small number of feature 
vectors that correspond to nonzero ay's. These feature vec- 
tors are the "relevance vectors" of H-RVM that have the 
following three characteristics: they prevent over-fitting, 
represent a parsimonious description of the data, and cor- 
respond to feature vectors that are most predictive of the 
log survival time. An equivalent prior setting is found by 
marginalizing the 0/s from the joint distribution of a and 
0 above, which imposes a multivariate Student's t prior on 
a with mean 0. 

Finally, we impose a Dirichlet prior on fi to ensure that 
its components lie on a probability simplex: 

fi = (Pi,lh,P3) ~ Dmcblet(fi\a l9 a 29 a 3 ) 9 (12) 

where the m-th component of fi, fi m , denotes the 
influence of m-th source in predicting the log survival 
time. 

The hierarchical Bayesian model (8) - (12) specifies a 
complete sampling model for the H-RVM framework. It 
can also be interpreted as a probabilistic approach for 
combining the predictions of log survival times from the 
three RVMs respectively corresponding to gene expres- 
sions, miRNA expressions, and their interactions. H-RVM 
introduces an additional hierarchy and combines the pre- 
dictions of these three RVMs as a weighted average, 
with the weights generated from a Dirichlet distribu- 
tion (12). The increased flexibility of H-RVM over RVM 
comes at the cost of analytic intractability of the poste- 
rior distributions of the H-RVM parameters. Estimation 
of the posterior distributions of the H-RVM's param- 
eters can proceed via either simulation-based Markov 
chain Monte Carlo (MCMC) approaches or determin- 
istic variational Bayes approaches. Given the complex- 
ity and size of high-throughput data in general and 



GBM data in particular, MCMC approaches tend to be 
computationally expensive and slow. We employ vari- 
ational Bayes methods from HKL [16] and obtain the 
analytically tractable variational posterior distribution, 
^(a,j8,0, y|t,X,Y,C0,d0,c y ,dy,^i,^2>^3)> that approxi- 
mates analytically intractable true posterior distribution, 
p(ct, fi,(j), y|t,X,Y, c^.d^.Cy.dy, ax.a^.a^). This approxi- 
mation achieves analytic tractability by assuming that 
a,/?,0, and y are independent under the variational 
posterior distribution. The analytic tractability leads to 
improved computational efficiency of the variational 
Bayes approach over sampling-based MCMC approaches. 
The derivations for variational posterior distributions are 
provided in Appendix A. 

3 Data analysis 

We apply the H-RVM approach to the GBM data as 
introduced in Section 1. GBM was one of the first can- 
cers evaluated by the TCGA. GBM data have multi- 
ple molecular measurements on over 500 samples that 
include gene expression, copy number, methylation and 
microRNA expression. TCGA datasets are available at 
http://tcga-data.nci.nih.gov/tcga/. The dataset we analyze 
here includes information about the gene expressions 
(11972 probes), miRNA expressions (534 probes), and 
(uncensored) survival times for matched patient samples 
(248). 

To remove the irrelevant noise variables before model 
fitting, we prescreened the gene and miRNA probes as 
follows. We performed univariate regression of the log 
survival times on the gene expression values, obtained 
p-values, and retained gene and miRNA probes that 
cross a liberal p-value threshold ( < 0.05 here) - to 
balance the practical and statistical significance. This 
pre-selection identifies 1747 and 43 gene expression and 
miRNA probes, respectively, for downstream modeling. 
All our analyses and comparisons were based on these 
selected gene and miRNA probes. 

We compare the predictions of H-RVM and three linear 
methods: penalized regressions (2) with the Lasso penalty 
[13] and with the following covariates: L gene expressions 
(Gene-Lasso), ii. miRNA expressions (MiRNA-Lasso), 
and Hi. both gene and miRNA expressions, and their first 
order interactions (Interaction-Lasso). We randomly split 
the GBM survival data into a training data and a test 
data with 223 (90%) and 25 (10%) patients, respectively. 
H-RVM, Gene-Lasso, MiRNA-Lasso, and Interaction 
Lasso are fit using the gene and miRNA expressions 
and log survival times in the training data. The varia- 
tional inference algorithm is used for fitting H-RVM (see 
Appendix A). The R package glmnet is used for the three 
penalized linear regressions [19,20]. The log survival times 
of the test data are predicted for the four methods using 
the model fits on the training data. The mean squared 
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prediction errors (MSPEs) are respectively calculated for 
the four models as the average of the squared difference 
between the observed and predicted values for the test 
data. This process of randomly splitting the GBM survival 
data into training+test data and fitting the four models is 
repeated 50 times. The results are summarized below. 

Figure 1 shows the prediction results for H-RVM, Gene- 
Lasso, MiRNA-Lasso, and Interaction-Lasso using the 
kernel density estimates (KDEs) of the MSPEs for the 50 
random splits. The KDEs of the MSPEs for H-RVM is 
shifted to lower values than those for the three penalized 
linear regressions. The KDEs of the MSPEs for Gene- 
Lasso, MiRNA-Lasso, and Interaction-Lasso are close 
to each other, which implies that the MSPEs for these 
models are fairly similar. Two observations arise from 
these results. First, the results indicate that penalized 
linear regression with the Lasso penalty does not lead 
to improved performance after accounting for interac- 
tions among covariates, which has been well-established 
in literature [19]. Second, the prediction results of the 
penalized linear regressions do not improve after model- 
ing the first order interactions among genes and miRNAs, 
thus indicating the presence of non-linear genomic effects 
and second or higher order interactions among them. 
For this case study, we see that H-RVM performs bet- 
ter than the penalized linear regression methods. This 
may be because of H-RVM accounts for the nonlinear 



effects of genes and miRNAs and models the interactions 
within genes, within miRNAs, and between genes and 
miRNAs. Further, because we model the log survival time, 
the gain for survival time predictions is, in fact, expo- 
nentially higher for H-RVM compared to those for the 
other methods. 

We compared the performance of the predictions of the 
log survival times from H-RVM and the observed survival 
times using Kaplan-Meier estimates of the survival curves. 
We used the R package survival to perform the log 
rank test and estimate the Kaplan-Meier survival curves 
[21]. Figure 2 compares the survival probability curves of 
the log survival times of patients predicted to be in the 
long and short survival groups by H-RVM. The patients 
are assigned to the long and short survival groups based 
on a median cut-off of the predicted log survival times 
obtained from H-RVM. The p-value of the log rank test 
that the two survival curves are the same is close to 0, 
indicating that the survival group predictions of H-RVM 
closely agree with the observed survival groups of the 
patients. In addition, Figure 3 compares the actual survival 
probability curves of the observed and predicted log sur- 
vival times of patients in the test data with the minimum 
MSPE. The p- values and the survival probability curves 
indicate that the log survival time predictions of H-RVM 
agree closely agree with the observed log survival times, 
as well. 

One of the additional gains of our modeling frame- 
work is the determination of which platform has a more 
profound influence on predicting the response, as cap- 
tured by the weight parameter Figure 4 shows the 
estimates of the weights for predicting the log sur- 
vival time of the patients for gene expression, miRNA 
expressions, and their interactions obtained from H- 
RVM. The medians of the weights (25% and 75% quartiles) 
for the three data sources are 0.239 (0.113 and 0.360), 
0.504 (0.408 and 0.583), and 0.201 (0.108 and 0.404), 
respectively. Interestingly, H-RVM shows that miRNAs 
have better predictive power than genes in predicting 
the log survival times of patients in the GBM data. The 
nonzero weight for interactions between gene and miRNA 
expressions further confirms the presence of nonlinear 
interactions. 

To gain biological insights into our results, we per- 
formed a functional analysis of our model fitting results. 
We used Ingenuity Pathway Analysis software to perform 
functional analysis on selected significant genes used in 
fitting H-RVM. We used targetHub [22] to find the known 
and predicted interactions between significant genes and 
miRNAs. mirTarBase, a curated database of experimen- 
tally validated miRNA targets, was our choice as a source 
of known gene and miRNA interactions [23] . To identify 
the predicted gene and miRNA interactions, we used tar- 
getScan data [24] to filter out miRNA-gene interactions 
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Figure 1 Kernel density estimates. Kernel density estimates (KDEs) 
of mean square prediction errors (MSPEs) for H-RVM, Gene-Lasso, 
MiRNA-Lasso, and Interaction-Lasso. The GBM survival data is 
randomly split 50 times into training and test data, all four models are 
fit on the training data, and MSPEs for log survival times are calculated 
for the test data using the model fit on training data. The x-axis 
represents the MSPEs and the y-axis represents the respective KDEs 
for H-RVM (in solid red), Gene-Lasso (in dotted blue), MiRNA-Lasso 
(in dotted and dashed blue), and Interaction-Lasso (in dashed blue). 
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Figure 2 Survival probability curves for TCGA data. Survival probability curves for log survival time in theTCGA GBM data. The solid lines are the 
Kaplan-Meier estimates of survival probabilities for the patients predicted to have long survival times (in blue) and for the patients predicted to have 
short survival times (in red), respectively. The patients are assigned to the long and short survival groups based on the estimates of log survival times 
obtained from H-RVM. The dotted lines indicate point-wise 95% confidence intervals for the survival probabilities. The p-value of the log rank test is 0. 



in which the miRNA and gene effects on survival were 
concordant, since discordant behavior is expected in bio- 
logical systems for a direct interaction between miRNA 
and its targets. 

Pathway analyses indicates that the anti-survival genes 
(i.e., genes with negative effects on survival times) are 
enriched with signaling pathways related to tumor inva- 
sion (see Figure 5). HMGB1 and TWEAK signaling 



pathways, which are known to drive tumor metastasis 
and severe inflammatory responses in GBM and other 
cancers, are associated with these genes [25-28]. Pro- 
survival genes are represented by PDGF, PTEN and other 
signaling pathways. It is well-established that the PDGF 
signaling pathway dominates the proneural subgroup, 
which correlates with a good survival time for patients 
with GBM [29]. The functional terms cellular movement 



P-value = 0.0964 



Observed survival time 
Predicted survival time 




log Survival time 

Figure 3 True and predicted survival probability curves. Survival probability curves for the observed log survival time and predicted log survival 
time (using H-RVM) of the patients in the test data with minimum mean square prediction error. The solid lines are the Kaplan-Meier estimates of 
survival probabilities for the predicted (in blue) and observed (in red) log survival times in the test data. The dotted lines indicate point-wise 95% 
confidence intervals for the survival probabilities. The p-value of the log rank test is 0.0964. 
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Figure 4 Boxplots of weights. Boxplots of weights fi = (/3i , of gene expressions, miRN A expressions, and their interactions in predicting 

log survival time. The GBM survival data is randomly split 50 times into training and test data, H-RVM is fit on the training data, and fl is obtained 
from the fit on training data. The y-axis shows the distributions of respective weights for gene expressions, miRNA expressions, and their interactions 
in predicting log survival time of patients across 50 random splits of the GBM survival data. 



and cell-to-cell signaling and interaction pathways are 
enriched for anti-survival genes, reinforcing their role in 
invasive GBM. 

The target analysis of miRNA revealed 22 known inter- 
actions between 8 miRNAs and 20 genes, as shown 
in Table 1. Four of these eight miRNAs (hsa-miR-31, 
hsa-miR-146b, hsa-miR-221 and hsa-miR-222) were pre- 
viously identified as anti-survival markers of GBM [30]. 



Mir-21 is an established marker of GBM and is known 
to target many tumor suppressor genes [31]. Mir-34a 
expression is higher in other GBM subtypes compared to 
that in the pro-survival proneural glioma subtype [32]. 
The anti-survival patterns of all these miRNAs indicate 
that these gene and miRNA interactions can be tar- 
geted for therapy of GBM subgroups with expected poor 
survival. We also identified 1006 predicted interactions 




© 2000-2013 Ingenuity Systems. Inc. All fights reserved. 

Figure 5 Comparison of the signaling pathways. Comparison of the signaling pathways associated with significant prognostic genes in 
Glioblastoma multiforme. 
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Table 1 List of known gene-microRNA interactions 
identified as significant in the H-RVM model using target 
analysis 

Gene symbols microRNA 



FOXP3,YY1,KLF13,ETS1 
F0X03, DDIT4 
ATAT1 
F0X03 

FGG , CPEB3, FGB, PIK3R1 
PDGFB 

PDCD4, TOPORS, BASP1, MARCKS, TP53BP2 
SIRT1,YY1,E2F3,CDC25C 



hsa-mir-31 
hsa-mir-221 
hsa-mir-23a 
hsa-mir-222 
hsa-mir-29a 
hsa-mir-146b 

hsa-mir-21 
hsa-mir-34a 



(by TargetScan) between 31 miRNA and 484 genes that 
are significant (see Additional file 1). 

4 Conclusions and future work 

We have presented an integrative framework, H-RVM, 
that generalizes the multiple kernel learning frame- 
work for integrating high-dimensional data from multi- 
ple sources, incorporating within and between platform 
interactions to develop a prediction model for clinical out- 
comes. We applied H-RVM to a high-dimensional TCGA 
GBM data to predict patient survival times using two data 
sources: gene and miRNA expressions, and found that the 
predictive performance of H-RVM is better than those 
of linear methods that do not model nonlinear effects 
and interactions. We hypothesize that H-RVM gains flex- 
ibility and power by modeling the non-linear interaction 
structures between gene and miRNA expressions. H-RVM 
will be a useful tool for biomedical researchers, as clini- 
cal prediction using multi-platform genomic information 
is an important step towards identifying personalized 
treatments for many cancers. We have code for fitting H- 
RVM that is freely available at the corresponding authors 
website (see Additional file 2). 

Although we have presented the application of H-RVM 
in the context of two platforms, the framework is general 
and can be extended and adapted to data from multi- 
ple platforms with different distributional assumptions. 
This will essentially entail a generalization of the H-RVM 
model by assuming additional terms for the different 
platforms. One key issue that warrants further investi- 
gation is an increase in the number of (multiplicative) 
between-platform interaction terms. We used the compu- 
tationally efficient variational Bayes methods, which are 
extremely useful for handling large datasets from projects 
such as TCGA. In addition, [17] presents more scalable 
versions of HKL and MKL that can be adapted to our 
framework. Our future work will concentrate on extend- 
ing the H-RVM framework using Bayesian spike and slab 
priors to select variables from the interacting covariates 



before embedding the data in the space of kernels, as 
well as incorporating uncertainty estimations of the scale 
parameters - thus aiding the joint model building process. 

Endnotes 

a We use gene and mRNA interchangeably to mean 
transcript-level expression. 

b We use bold lowercase and uppercase alphabets to 
denote column vectors and matrices, respectively. 

A Appendix: Variational inference for H-RVM 

Following the hierarchic kernel learning algorithm (HKL) 
of [16], we provide the derivation for the variational 
inference algorithm that estimates the variational pos- 
terior distributions for the parameters of H-RVM. Our 
interest lies in the posterior distributions of a and P that 
are obtained using the Bayesian model (8) - (12). Unlike 
RVM, the posterior distributions of a and P in H-RVM 
are analytically intractable. There are several techniques 
that can be used to obtain these posteriors distributions. 
We employ the variational Bayes methods from HKL 
[16] and obtain analytically tractable variational posterior 
distribution, q(a, P, 0, y \t,X,Y,c ( p,d ( f ) ,Cy,dy,ai,a2,a3), 
that approximates analytically intractable true posterior 
distribution, p(a, P, 0, y |t, X, Y, c^, c Y1 dy 1 a\,a2, a<$). 
The variational approach minimizes the Kullback- 
Liebler (KL) divergence between q(a, P, 0, y\t, 
X, Y, c^, d<f>, c y , d y , a\, a^, az) and p(ci,p, 0, y|t, X, Y, 
c ( / ) ,d ( f ) ,Cy,dy,ai,a2,a3). This approximation achieves 
analytic tractability by assuming that a,p, 0, and y 
are independent under 0, y|t,X, Y, c^.d^.Cy, 

dy,a\, ci2, a<$). Therefore, 



q(a, P,(/),y |t, X, Y, c^, d^, c Y ,d Y ,a\ y <z 2 > ^3) 
= q(a)q(P)q(4>)q(y), 



(13) 



where we have suppressed the conditioning on the 
data and hyperparameters for the variational pos- 
teriors on the right. Notice that the factorization 
(13) alone guarantees the analytic tractability of 
q(ot, f$, 0, /|t,X, X,c^d^,c Y ,d Y ,a\,a2,a^), and we do 
not assume any distributional form for the q's. Following 
[16] and [14], the variational posterior distributions are 
derived as 

logtf(a) ocEpj t y[p(ci,p,4>,Y,t,X,Y\ 

cq, d(p, Cy,dy,a\, d2> a^)\ (14) 
\o%q(fi) oc E a ,0, y [/?(«, /J, 0,y,t,X,Y| 

c ( f> t d ( f ) ,Cy,dy,ai,a2,a3)], (15) 
log#(0) ocE a ^ K [/7(a,^,0,y,t,X,Y| 

c^, d<f„ Cy,dy,a\, ci2, a^)\ (16) 
log#(y) oc E a>i 8,0 [/?(«,/?, 0,y,t,X,Y| 

c ( f )J d ( j )J Cy J dy J ai J a2,a3)]> (17) 
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where all expectations are with respect to the variational 
posterior distributions. Hereafter, we will denote E # [/] as 
{/% for notational simplicity. 

Following [16] and using (14), the variational posterior 
distribution of a, 



E a = [(Y)y(K fi Kj) fi +diagm+)]~\ ( 18 ) 



Following [16] and using (16), the variational posterior 
distribution of 0, 



N 

<7(0) = ]~[^(0y), where 

7=0 

q{(j)j) = Gamma^+C0, i(a?) a +^0 ),for/= 1,...,N. 

(19) 



Following [16] and using (17), the variational posterior 
distribution of y, 



q{y) = Gamma + c y , i (||e|| 2 ) a ^ + d^ t where ^ 
llell 2 = lit- Kla II 2 . 



All the expectations above are determined using 
(a) at (act T )a> (0)0, and (y) y , which are available from the 
distributional forms of q(a), #(0), and q(y) in (18) - (20). 
Specifically, 



(a) a = {U0L T )cL = ?>a + > 



(0;>0 ; - 



2 +C0 



5((S a ) ; y+(^ a ^);;) + ^ 

2 +c y 



2<lleH 2 W+^ 



N 



N 



3 3 
m=l 1=1 

N 



Instead of q(P), its non-normalized version q*(P) is 
available from [16] as 

tfifi) = 11 C m_lex P (" (P T W - W T b)j, where 



N 



&ml = X tfn n {tt<x T )ctki n , for m, / = 1,2,3 and 



w=i 

b m = (a r ) a K m t, form = 1,2,3. 

(21) 

Following [16], calculate (P)p, (\o%P)p, and {PP T )p, as 
follows. Sample «S /?s from Dirichlet(<2i,<3_,<23) and esti- 
mate the expectations as (f(P))p ^ J2s=if(Ps) w (Ps) 
where f(P) = P, log P, and/?/? 7 respectively, and 



exp(-^(/? s T fi/? 5 -2/? s r b)) 
Ef =1 exp(-^(^.-2/jfb))' 



The analytic tractability of q(a, P, 0, y |t, X, Y, cq, d^, c y , 
<i K ,^i,^2>^3) in variational inference guarantees that the 
marginal variational distribution (or likelihood) of the 
data q(t,X,Y\c ( f ) ,d ( i ) ,c y ,d y ,ai,a2,a3) is also analytically 
tractable. Estimate hyperparameters c ( j )1 d ( j )1 c yi d yi ai 1 a2 1 
and as as 

arg max log#(t, X, Y|c^, d<f> t c yt d y , a± t a^ai), 

C(/),d^,Cy ,d y ,a\ ,a2,a^ 

which is the type II maximum likelihood procedure as 
recommended in [16]. The kernel parameters {of}? =1 
are learned respectively from three RVMs for each of 
the three sources using cross-validation as recommended 
by [15]. 



Additional files 



n=l 



Additional file 1: mRNA-miRNA-predicted-interactions.xlsx. Excel file 
containing all predicted mRNA and microRNA interactions flagged as 
significant in our analysis. The file is available at: http://odin.mdacc.tmc. 
edu/~vbaladan/Veera_Home_Page/Software_files/mRNA-miRNA- 
predicted-interactions.xlsx. 

Additional file 2: hrvm-0.1 .1 .tar.gz. R package for fitting H-RVM 
available at: http://odin.mdacc.tmc.edu/~vbaladan/Veera_Home_Page/ 
Software_files/hrvm_0.1 .1 .tar.gz. 
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